Add stopword support to IntervalBuilder#39637
Conversation
|
Pinging @elastic/es-search |
jimczi
left a comment
There was a problem hiding this comment.
Thanks @romseygeek , I left some questions/comments
| if (spaces >= 0) { | ||
| if (synonyms.size() == 1) { | ||
| terms.add(synonyms.get(0)); | ||
| terms.add(extend(synonyms.get(0), spaces)); |
There was a problem hiding this comment.
The synonyms contains the word at the position before the gap so the extend should be applied forward ?
There was a problem hiding this comment.
It applies backwards because we're using PositionIncrement to detect when there's a gap preceding us in the TokenStream. If you've got a posInc of 2, that means theres a gap before you in the stream, so you need to extend backwards 1 to cover it.
There was a problem hiding this comment.
Ah, actually no, it's a bug - I added the test you suggested and it fails, have updated.
| } | ||
| else if (synonyms.size() > 1) { | ||
| terms.add(Intervals.or(synonyms.toArray(new IntervalsSource[0]))); | ||
| terms.add(extend(Intervals.or(synonyms.toArray(new IntervalsSource[0])), spaces)); |
There was a problem hiding this comment.
Same here, you should use extend(0, spaces) and not extend(spaces, 0) ?
| Intervals.extend(Intervals.term("term5"), 1, 0) | ||
| ); | ||
| assertEquals(expected, source); | ||
| } |
There was a problem hiding this comment.
Can you also add a test with a simple word synonym that starts after a gap ?
jimczi
left a comment
There was a problem hiding this comment.
LGTM, should it be considered as a bug fix and merged to 7.0 ?
+1, we'll generate incorrect queries otherwise |
The match interval builder analyses input text and converts it to an IntervalSource, and as such may generate token streams with stopwords. This commit deals with these by using the extend factory to cover the gaps produced by these stopwords so that phrase and ordered queries work correctly.
The match interval builder analyses input text and converts it to an IntervalSource, and as such may generate token streams with stopwords. This commit deals with these by using the extend factory to cover the gaps produced by these stopwords so that phrase and ordered queries work correctly.
The
matchinterval builder analyses input text and converts it to anIntervalSource, and as suchmay generate token streams with stopwords. This commit deals with these by using the
extendfactory to cover the gaps produced by these stopwords so that phrase and ordered queries work
correctly.